Part 2
This document shows an exemplary downstream analysis on immune gene
expression using the SingleCellAlleleExperiment (SCAE)
package. The used dataset is under controlled access and not for public
usage. The here shown results are part of the masters thesis:
Development of a multi-layer R object to study immune gene expression in spatial (and single-cell) transcriptomics at allele, gene and functional level written by Jonas Schuck.
The project was conducted in the research group ‘Computational Immunology’ lead by Dr. Katharina Imkeller at the Edinger Institute at the University Hospital Frankfurt. Dr. Imkeller provided supervision of the project. Primary corrector is Prof. Dr. Florian Buettner at Goethe University Frankfurt.
This document gives only a brief introduction to the
SingleCellAlleleExperiment (SCAE) package as well as an
overview for the performed steps described in the
package application section of the thesis. For further
information conduct the thesis.
This is the second part of the exemplary workflow. Continuing with
loading the object with the saved results from part 1.
Here, an specific region is chosen from the t-SNE plots of the
HLA-C immune gene, where a difference in the expression is
observed between the two HLA-C alleles C*04:01:01:01 and
C*06:02:01:01. Then, differential expression analysis is
performed on the subsetted cluster to determine marker genes. This
allows to find not only regions that show functional differences,
observed by gene expression, but also subtypes of cells that show
genetical differences, shown by differences in expression on the allele
level.
The following package are abundant for performing the downstream analysis.
library(SingleCellAlleleExperiment)
library(scran)
library(scater)
library(gridExtra)
library(tidyverse)
library(patchwork)
library(cowplot)
library(ggplot2)
library(bluster)
redim_scae <- readRDS(file = "~/Work/R/Masterthesis/dev/SCAE/example_workflow_t_110_pca_tsne_perplex.rds")
redim_scae
## class: SingleCellAlleleExperiment
## dim: 29923 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29923): ENSG00000160072.20 ENSG00000228037.1 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(6): PCA_a PCA_g ... TSNE_g_50 TSNE_f_50
## mainExpName: NULL
## altExpNames(0):
reducedDimNames(redim_scae)
## [1] "PCA_a" "PCA_g" "PCA_f" "TSNE_a_50" "TSNE_g_50" "TSNE_f_50"
Perform nearest-neighbour clustering on TSNE_g_50. Here,
the clusterCells() function from the bluster
package.
marker_scae <- redim_scae
nn.clusters <- clusterCells(marker_scae, use.dimred="TSNE_g_50", BLUSPARAM=NNGraphParam(k=50))
colLabels(marker_scae) <- nn.clusters
#renaming the label columns
colnames(colData(marker_scae))
## [1] "Sample" "Barcode" "sizeFactor" "label"
Changing the labels column in case multiple cluster-runs are computed.
colnames(colData(marker_scae))[4] <- "label_k_50"
colnames(colData(marker_scae))
## [1] "Sample" "Barcode" "sizeFactor" "label_k_50"
Plotting the cluster results on the TSNE_g_50.
plotReducedDim(marker_scae, "TSNE_g_50", colour_by="label_k_50") + theme_bw()
Plot again the HLA-C immune gene with the corresponding
alleles to see which sublcuster contains the in-cluster differences
between the two alleles. The differences appear to be for example in
cluster label 17.
which_tsne <- "TSNE_g_50"
#EGA_hla_c_alleles
tsne_g_c <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-C")
tsne_g_c1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*04:01:01:01")
tsne_g_c2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*06:02:01:01")
p1 <- tsne_g_c + tsne_g_c1 + tsne_g_c2
p1
Subset the cluster labels of the cluster containing in-cluster differences in the HLA-C plot.
cl_subset <- marker_scae[, colData(marker_scae)$label_k_50 %in% c(1,2,9,27,27,17,20)]
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "label_k_50") + theme_bw()
Plot again the HLA-C allele expression plots on the subcluster to have a zoomed in view on the chosen subcluster.
hclusc1 <- plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "C*04:01:01:01")
hclusc2 <- plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "C*06:02:01:01")
pclus <- hclusc1 + hclusc2
pclus
To determine marker genes for each cluster region, the
findMarkers() function from scran is
used.
colData(cl_subset)$label_k_50 <- droplevels(colData(cl_subset)$label_k_50)
#markergenes for the subcluster
hclus_markers <- findMarkers(cl_subset, test = "binom", direction = "up", groups = colData(cl_subset)$label_k_50)
hclus_markers
## List of length 6
## names(6): 1 2 9 17 20 27
Return the list of determined top marker genes for each subcluster.
#---------------subcluster1---------------#
interesting_c1 <- hclus_markers[[1]]
top_list_c1 <- interesting_c1[interesting_c1$Top <=5,]
ens_only_c1 <- rownames(top_list_c1[startsWith(rownames(top_list_c1), "ENS"),])
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c1) +
labs(title = "Upregulated markers subcluster 1")
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c1[1]) +
labs(title = "Top marker subcluster 1") + theme_bw()
#---------------subcluster2---------------#
interesting_c2 <- hclus_markers[[2]]
top_list_c2 <- interesting_c2[interesting_c2$Top <=5,]
ens_only_c2 <- rownames(top_list_c2[startsWith(rownames(top_list_c2), "ENS"),])
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c2) +
labs(title = "Upregulated markers subcluster 2")
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c2[1]) +
labs(title = "Top marker subcluster 2") + theme_bw()
#---------------subcluster9-----------#
interesting_c3 <- hclus_markers[[3]]
top_list_c3 <- interesting_c3[interesting_c3$Top <=3,]
ens_only_c3 <- rownames(top_list_c3[startsWith(rownames(top_list_c3), "ENS"),])
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c3) +
labs(title = "Upregulated markers subcluster 9")
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c3[1]) +
labs(title = "Top marker subcluster 9") + theme_bw()
#---------------subcluster17--!!!-------------#
interesting_c4 <- hclus_markers[[4]]
top_list_c4 <- interesting_c4[interesting_c4$Top <=5,]
ens_only_c4 <- rownames(top_list_c4[startsWith(rownames(top_list_c4), "ENS"),])
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c4) +
labs(title = "Upregulated markers subcluster 17")
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c4[2]) +
labs(title = "Top marker subcluster 17") + theme_bw()
#---------------subcluster20---------------#
interesting_c20 <- hclus_markers[[5]]
top_list_c20 <- interesting_c20[interesting_c20$Top <=5,]
ens_only_c20 <- rownames(top_list_c20[startsWith(rownames(top_list_c20), "ENS"),])
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c20) +
labs(title = "Upregulated markers subcluster 20")
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c20[1]) +
labs(title = "Top marker subcluster 20") + theme_bw()
#---------------subcluster27---------------#
interesting_c21 <- hclus_markers[[6]]
top_list_c21 <- interesting_c21[interesting_c21$Top <=5,]
ens_only_c21 <- rownames(top_list_c21[startsWith(rownames(top_list_c21), "ENS"),])
#ens_only_c21
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c21) +
labs(title = "Upregulated markers subcluster 27")
plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c21[1]) +
labs(title = "Top marker subcluster 27") + theme_bw()
Overview plot for the marker gene for subcluster label 17.
hclus <- plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "label_k_50") + theme_bw()
c1 <- plotExpression(cl_subset, x = "label_k_50", features = "C*04:01:01:01")
c2 <- plotExpression(cl_subset, x = "label_k_50", features = "C*06:02:01:01")
bach2 <- plotExpression(cl_subset, x = "label_k_50", features="ENSG00000112182.16")
unified_exp <- hclus + theme_bw() + labs(title = "A") + xlab("TSNE_g") + ylab("TSNE_g") +
guides(color=guide_legend("label")) + theme(title = element_text(size = 12)) +
c1 + theme_bw() + labs(title = "B") + xlab("label") + theme(title = element_text(size = 12)) +
bach2 + theme_bw() + labs(title = "C") + xlab("label") + theme(title = element_text(size = 12)) +
c2 + theme_bw() + xlab("label") + theme(title = element_text(size = 12))
unified_exp
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] bluster_1.4.0 cowplot_1.1.1
## [3] patchwork_1.1.3 lubridate_1.9.2
## [5] forcats_1.0.0 stringr_1.5.0
## [7] dplyr_1.1.2 purrr_1.0.2
## [9] readr_2.1.4 tidyr_1.3.0
## [11] tibble_3.2.1 tidyverse_2.0.0
## [13] gridExtra_2.3 scater_1.22.0
## [15] ggplot2_3.4.3 scran_1.22.1
## [17] scuttle_1.4.0 SingleCellAlleleExperiment_0.0.0.9000
## [19] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [21] Biobase_2.54.0 GenomicRanges_1.46.1
## [23] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [25] S4Vectors_0.32.4 BiocGenerics_0.40.0
## [27] MatrixGenerics_1.6.0 matrixStats_1.0.0
## [29] BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] ggbeeswarm_0.7.2 colorspace_2.1-0
## [3] XVector_0.34.0 BiocNeighbors_1.12.0
## [5] rstudioapi_0.15.0 farver_2.1.1
## [7] ggrepel_0.9.3 bit64_4.0.5
## [9] AnnotationDbi_1.56.2 fansi_1.0.4
## [11] xml2_1.3.5 sparseMatrixStats_1.6.0
## [13] cachem_1.0.8 knitr_1.43
## [15] jsonlite_1.8.7 cluster_2.1.4
## [17] dbplyr_2.3.3 png_0.1-8
## [19] BiocManager_1.30.22 compiler_4.1.2
## [21] httr_1.4.6 dqrng_0.3.0
## [23] Matrix_1.6-1 fastmap_1.1.1
## [25] limma_3.50.3 cli_3.6.1
## [27] BiocSingular_1.10.0 htmltools_0.5.6
## [29] prettyunits_1.1.1 tools_4.1.2
## [31] rsvd_1.0.5 igraph_1.5.1
## [33] gtable_0.3.3 glue_1.6.2
## [35] GenomeInfoDbData_1.2.7 rappdirs_0.3.3
## [37] Rcpp_1.0.11 jquerylib_0.1.4
## [39] vctrs_0.6.3 Biostrings_2.62.0
## [41] DelayedMatrixStats_1.16.0 xfun_0.40
## [43] beachmat_2.10.0 timechange_0.2.0
## [45] lifecycle_1.0.3 irlba_2.3.5.1
## [47] statmod_1.5.0 XML_3.99-0.14
## [49] org.Hs.eg.db_3.14.0 edgeR_3.36.0
## [51] zlibbioc_1.40.0 scales_1.2.1
## [53] hms_1.1.3 parallel_4.1.2
## [55] yaml_2.3.7 curl_5.0.2
## [57] memoise_2.0.1 sass_0.4.7
## [59] biomaRt_2.50.3 stringi_1.7.12
## [61] RSQLite_2.3.1 highr_0.10
## [63] ScaledMatrix_1.2.0 filelock_1.0.2
## [65] BiocParallel_1.28.3 rlang_1.1.1
## [67] pkgconfig_2.0.3 bitops_1.0-7
## [69] evaluate_0.21 lattice_0.21-8
## [71] labeling_0.4.2 bit_4.0.5
## [73] tidyselect_1.2.0 magrittr_2.0.3
## [75] R6_2.5.1 generics_0.1.3
## [77] metapod_1.2.0 DelayedArray_0.20.0
## [79] DBI_1.1.3 pillar_1.9.0
## [81] withr_2.5.0 KEGGREST_1.34.0
## [83] RCurl_1.98-1.12 crayon_1.5.2
## [85] utf8_1.2.3 BiocFileCache_2.2.1
## [87] tzdb_0.4.0 rmarkdown_2.24
## [89] viridis_0.6.4 progress_1.2.2
## [91] locfit_1.5-9.8 grid_4.1.2
## [93] blob_1.2.4 digest_0.6.33
## [95] munsell_0.5.0 beeswarm_0.4.0
## [97] viridisLite_0.4.2 vipor_0.4.5
## [99] bslib_0.5.1